A Review of Current Software for Handling Missing Data

نویسنده

  • Joop J. Hox
چکیده

When we deal with a large data set with missing data, we have to undertake two important tasks. First, it is important to inspect the pattern of missingness. This can provide very practical information. For instance, we may find that most of the missing values concern only one specific variable. If this variable is not central to our analysis problem, we may delete it from our analysis, rather than keeping it, and therefore having to delete many cases. We would also like to know if the missingness forms a pattern, or if it is related to some of our observed variables. For if we discover a system in the missingness pattern, we may try to include that in our statistical analyses. The second task is to produce sound estimates of the parameters of interest, despite the incompleteness of the data. There are two major approaches to this problem. One is to make the data complete by imputing the missing values, and then do the analysis on the completed data. The other is to use a method, typically a likelihood-based procedure, that allows us to model incomplete data directly. Modern software assists us in both tasks. For inspecting the pattern of missingness, either we can use the standard procedures in a statistical package like SPSS, or the specialized procedures made available in the SPSS procedure Missing Value Analysis (MVA). For imputation and direct modeling of incomplete data, we need specialized software. This contribution reviews some of the options in generally available software like SPSS MVA, SOLAS, and NORM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

کاربرد جای گذاری چندگانه در تحقیقات پزشکی و اپیدمیولوژی

Data missing, which occurs for different reasons, is an unavoidable problem in epidemiological studies. It is quite widespread and, therefore, it is considered as a challenge in research design and data analysis by many methodologists. Complete case analysis is often used in studies with missing data however, this approach may result in inaccurate estimates and inferences due to bias associated...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

DEA with Missing Data: An Interval Data Assignment Approach

In the classical data envelopment analysis (DEA) models, inputs and outputs are assumed as known variables, and these models cannot deal with unknown amounts of variables directly. In recent years, there are few researches on handling missing data. This paper suggests a new interval based approach to apply missing data, which is the modified version of Kousmanen (2009) approach. First, the prop...

متن کامل

مقایسه روش الگوریتم EM و روش‌های متداول جانهی داده‌های گمشده: مطالعه‌روی پرسشنامه خوددرمانی بیماران دیابتی

Background and Objectives: Missing data is a big challenge in the research. According to the type of the study and of the variables, different ways have been proposed to work with these data. This study compared five popular imputation approaches in addressing missing data in the questionnaires. Methods: In this study, 500 questionnaires were used for self-medication in diabetic patients. Mi...

متن کامل

A Review of Missing Data Handling Methods in Education Research

Missing data are a common occurrence in survey-based research studies in education, and the way missing values are handled can significantly affect the results of analyses based on such data. Despite known problems with performance of some missing data handling methods, such as mean imputation, many researchers in education continue to use those methods as a quick fix. This study reviews the cu...

متن کامل

Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF.

This paper brings together some modern statistical methods to address the problem of missing data in obesity trials with repeated measurements. Such missing data occur when subjects miss one or more follow-up visits, or drop out early from an obesity trial. A common approach to dealing with missing data because of dropout is 'last observation carried forward' (LOCF). This method, although intui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000